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This paper reports the results of an empirical study on the tooth breakage failure mode in spur gears. Of four dominant 
gear failure modes (breakage, wear, pitting, and scoring), tooth breakage is the most precipitous and often leads to 
catastrophic failures. The cracks were initiated using a fatigue tester and a custom-designed single-tooth bending fix- 
ture to simulate over-load conditions, instead of traditional notching using wire electrical discharge machining (EDM). 
The cracks were then propagated on a dynamometer. The ground truth of damage level during crack propagation was 
monitored with crack-propagation sensors. Ten crack propagations have been performed to compare the existing con- 
dition indicators (CIs) with respect to their: ability to detect a crack, ability to assess the damage, and sensitivity to 
sensor placement. Of more than thirty computed CIs, this paper compares five commonly used: raw RMS , FMO, NA4, 
raw kurtosis, and NP4. The performance of combined CIs was also investigated, using linear, logistic, and boosted 
regression trees based feature fusion. 


Introduction 

The U.S. Army has the goal of transitioning from time- 
based to condition-based maintenance for its fleet of vehicles 
in an effort to improve safety and reduce costs. The premise 
is that critical components are serviced when indicators re- 
veal that they can no longer function as designed. A major 
focus has been on the helicopter transmission and researchers 
have proposed vibration-based CIs to detect faulty compo- 
nents, particularly bearing and gears. (Refs. 1-10) However 
the availability of damage cases to evaluate and validate them 
are scarce. This research addresses the need for seeded fault 
data, as it pertains to gear tooth cracks, by developing a well- 
documented, statistically significant database of monitored 
cracks from inception to failure. 

Gear service life can be divided into two phases: crack ini- 
tiation and crack propagation. (Refs. 11-13) Because crack 
initiation takes considerably longer than propagation, an ac- 
celerated method was employed by subjecting a single tooth 
to a cyclic force above its intended operating range. (Refs. 14, 
15) Traditional crack seeding uses wire Electrical Discharge 
Machining (EDM), where a gear is notched. (Refs. 16, 17) 
However, in this study, a single tooth was subjected to fatigue 
crack initiation. A potential advantage of the fatigue-based 
crack seeding vs. notching is that the radius of curvature of the 
initiated cracks are considerably smaller. The smaller cracks 
propagate faster and better represent real field failures. Seed- 
ing cracks have an additional advantage: it isolates the loca- 
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tion of breakage and thus enables manageable monitoring of 
the ground truth. 

The objective of the current study is to compare the per- 
formance of vibration based CIs to detect incipient cracks in 
gear teeth. It is an extension of a methodology described 
in (Ref. 15) in which crack initiation was conducted on a 
high cycle fatigue test rig and propagated on a 4- square spur 
gear fatigue test rig. This current study also employed crack- 
propagation sensors (CPS) to obtain the ground truth informa- 
tion on the damage as was done in (Ref. 18). 

Although many CIs are computed and stored in a database, 
this report will focus on five. The rationale for selecting this 
subset is given as follows: Root Mean Square (RMS) and Kur- 
tosis (Kur) are chosen for their simplicity and long history of 
use in vibration monitoring. F MO was selected for being one 
of the first CIs, first proposed by Stewart in 1977 (Ref. 5). 
NA4 was chosen because it has had considerable success, after 
being introduced by Zakrajek et al. in 1998 (Ref. 3). Finally, 
NP4, a more recent features, introduced by Polyshchuk et al. 
in 2003 uses a time-frequency method. (Ref. 9). Descriptions 
of many CIs can be found in (Refs. 1,2). 

Table 1 summarizes formulae for the five CIs considered. 
The RMS , Kurtosis, and NP4 are based on raw signals while 
NA4, and FM0 are based on time synchronously averaged 
data. x(t) is the measured signal, N is the number of data 
points, P to P is the peak-to-peak amplitude of x(t ), A ^ is the 
amplitude of the gear harmonics k , r(t ) is the residual signal 
derived by removing known frequency components, P n is the 
signal power and n represents the number of gear mesh har- 
monics removed in r(t) and M is the current time record in 
run ensemble. 


Table 1: Cl Table 


Metric 

Formula 

RMS 

y/ killed 
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Test Description 

The gears were designed and manufactured according to 
NASA drawing and specifications with 28 teeth, a diamet- 
rical pitch of 8, and a pressure angle of 20 degrees. More 
details on the specifications can be found in Townsend and 
Shimski (Ref. 19). Figure 1 depicts a flowchart of the testing 
procedure. For each test set, the two test gears were spun 
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Fig. 1: Testing Flowchart 

together for 54.5s in a gear box instrumented with four ac- 
celerometers whose positions are shown in Figure 2. Some 
of these are placed at locations very sensitive to gear cracks, 
and some at locations that are expected to have relatively poor 
sensitivity to gear crack. Sub-optimal placement was moti- 
vated by the fact that one must use pragmatic, sub-optimal 
sensor placement in practical applications due to space and 
other constraints. The torque and angular speed profile is 
given in Figure 3. The repeating cycle is indicated in the 
graph with T. This established the baseline for the test. The 
context (operating condition) profiles were designed to prop- 
agate the cracks effectively, but also to gain understanding of 
the effect of speed and torque on gear fault detection condition 
indicators. The dynamic portion of the profile is used to deter- 
mine the dependence of CIs on changing operating conditions. 
Only Cl values for fixed operating conditions of T = 170 ft-lb 
and CD = 1500 rpm will be presented. 

After the baseline tests were completed, the top gear 
was removed from the gearbox, mounted in the single-tooth, 
fatigue-tester fixture shown in Figure 4, and subjected to a 
cyclic load of 100 to 3100 lbs at 10Hz. The anvil applied a 
normal force to a single tooth at the highest point of single 
tooth contact (HPSTC) as shown in Figure 4c. Controlling 
the crack size was achieved using the measured compliance 
of the fatigue rig as feedback. (Ref. 14) It was desired to cre- 
ate a crack approximately equal in size. 



Fig. 2: Dynamometer test stand (gearbox) with the location of 
the accelerometers. 




Time t [sec] 

Fig. 3: The segment of the dynamometer operating conditions, 
viz. torque and angular speed. The repeating cycle is indicated 
in the graph with T . 

The initiated cracks were verified in a non-destructive 
manner using florescent magnetic particle inspection. Before 
inspection, the gear is placed in a custom fixture designed to 
place the tooth in tension of approximately 500 lb. The fix- 
ture is shown in Figure 5. The tooth is covered with a layer of 
fluorescent iron powder, placed in strong magnetic field, and 
observed under ultraviolet (UV) light. The magnetic particles 
coalesce at the flux leakage caused by the crack and become 
detectable as shown in 6a. 

Once the crack was initiated and verified, the gear was 
equipped with two crack-propagation sensors, one on each 
face as shown in Figure 6b. The gear was then re-assembled in 
the gearbox and operated according to the same profile of Fig- 
ure 3. Testing was concluded when the crack propagated be- 
yond the range of both crack-propagation sensors. The propa- 
gation times ranged from 39 minutes to over 3 days and a total 
of 10 set of gears were tested as shown in Table 2. Often, the 



Fig. 4: Fatigue-tester-based, single-tooth fixture, (a) CAD model 
(b) Photo of the built fixture, (c) The force is applied approximately 
perpendicularly at HPSTC 



(a) (b) 


Fig. 5: Non-destructive method for crack verification (a) Imag- 
ing system (b) Custom gear fixture 

system needed to shut down in the evening and be restarted 
in the morning. Propagation was run until crack propagation 
sensors indicated full propagation. At the completion of each 
test, the tooth was then completely severed using the fatigue 
tester in order to gain a metallurgical understanding of the fa- 
tigue cracks. The bottom image of the tooth was imaged as 
shown in Figure 7. 





Fig. 7: The bottom surface of a cracked tooth. 

database tables are shown in Figure 8. The crack initiation 
data is stored into two main tables: instron Jest -details and 
instron Jest -data. The former table contains the information 
on the test parameters (date, force magnitude, force frequency, 
data acquisition parameters, etc.). The latter table shows the 
measurements (force, displacements, computed compliance, 
etc.). Similarly the dynamometer data has a separate table 
for test parameters and for data. However, the high-frequency 
data bursts are stored in linked binary files. 


Cl Database 

A MySQL database was developed with the intention of 
serving the gear research community by providing the ground 
truth for further development of diagnostics and prognos- 
tics of gear crack failure mode based on vibration signals. 
The tables containing measured data cannot be altered by 
the user, but the database schema allows the user to recom- 
pute CIs as well as append and compute additional CIs. The 

Table 2: Summary of test runs 


GearlD Total propagation time [min] 


13 

1809.1 

16 

255.8 

20 

506.2 

104 

4693.3 

106 

614.5 

108 

52.6 

112 

99.8 

114 

79.9 

116 

47.5 

118 

38.7 


Cl Computation and Analysis 

The CIs are compared with respect to their: ability to de- 
tect a crack early, ability to assess the damage, and sensitiv- 
ity to sensor placement. While one can speculate that these 
Cl attributes may be related, and expect that a Cl that is more 
sensitive initially will remain more sensitive through the prop- 
agation, there is currently no evidence to support this. More- 
over, a previous study (Ref. 10) suggested fusion of features 
that showed more sensitivity for crack initiation and features 
more sensitive to crack propagation. 


Cl Comparison: Early Crack Detection 

Crack detection is examined from two different points of 
view: 1) ability to detect cracks early, and 2) ability to rapidly 
achieve high confidence of the crack as the crack grows. To 
compare CIs’ ability to detect the crack early, the features 
computed during the baseline test were labeled as no fault and 






instron_test_details 


3] instron_test_data ▼ 


GearlD VARCHAR(IO) 
f ToothID VARCHAR(IO) 
f Run Number VARCHAR(IO) 
DateTime DATETIME 
TestID SMALLINT 


TestID SMALLINT 
SampleTimeSec DECIMAL... 
InstronLoad FLOAT 
InstronDisplacement FLOAT 
SensorMeasurement FLOAT 


_J instron_test_statistics f 


DAQ Sample Rate Hz VARCHAR(15) CrackSensor2 FLOAT 
Fatigue Cycle Rate Hz VARCHAR(15 Compliance FLOAT 
Number of Fatigue Cycles VARCHAF 
Min Compression Load VARCHARCIT. 

Max Compression Load VARCHAR^ 

InstronLoadUnits VARCHAR(15) 

InstronDisplacementUnits VARCHAI 
SensorType VARCHAR(45) 

SensorUnits VARCHAR(15) 

Notes VARCHAR(IOOO) 

LastDataUpdate DATETIME 


f TestID SMALLINT 
CountOfMeasurements INT 
MaxTimeOfRuns FLOAT 
MinTimeOfRuns FLOAT 
> LastUpdated DATETIME 


AFT DELETE instron_test_data_deletion 
AFT DELETE dyno_test_details_deletion 
AFT DELETE dyno_test_data_deletion 


Main fatigue-testing-related data 

Fig. 8 


dyno_test_details 


3] dyno_test_portdata f 


TestID SMALLINT 
SampleTimeSec DOUBLE 
RotationsCount INT 
Speed FLOAT 
Temperature FLOAT 
CrackSensorl FLOAT 
<> CrackSensor2 FLOAT 
Torque FLOAT 


dyno_test_burst_details 


GearlD VARCHAR(IO) 

RunNumber VARCHAR(IO) 

TestID SMALLINT 
O DateTime DATETIME 
UnixTimeStamp BIGINT 
O Notes VARCHAR(IOOO) 

O ToothID VARCHAR(IO) 

> BurstLenght TINYINT 

> BurstPeriod SMALLINT 
MaxNumberCyclesPerRun SI 

O CrackLimit FLOAT 
O DeltaT DOUBLE 
O NumChannels INT TestID SMALLINT 

O NumSamples INT * BurstID SMALLINT 
O FormatVersion INT 0 SampleTimeSec DOUBLE 
InsertCount INT 
> FileName VARCHAR(IOO) 
AFT DELETE dyno_test_de LastDataUpdate DATETIME 


Main dynamometer data 

The main tables in the database. 


■ 

dyno_test_feature_descr... ▼ 


Feature ID SMALLINT 
Name VARCHAR(20) 
Description VARCHAR(200) 


dyno_test_featu res ▼ 


I TestID SMALLINT 
Time DOUBLE 
t Feature ID SMALLINT 
O Value FLOAT 
accSource TINYINT 


3 gears_photos 


PhotoSetID INT 
I PhotoID INT 

OrigFileName VARCHAR(IOO) 
Description VARCHAR(500) 
GlobalUniqueFileName VARC... 


Features and images 
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Fig. 9: RMS: (a) Computing an ROC point (pFA, pDet). (b) ROC 
and AUQj 

the features computed during the propagation test prior to the 
first detected CP sensor strand breakage as fault. 

Receiver operating characteristic (ROC) plots were used 
for performance comparisons. Two probability distribution 
functions (PDFs) of the CIs are produced from the data: one 
associated with the known undamaged case and one for the 
known damaged case as shown in Figure 9a. The ROC can be 
thought of as a measure of overlap of the two PDFs. The ratio 
of the detected faults to all faults is plotted against the ratio 
of false detections, as the threshold was varied (Refs. 20-22) 
Figure 9a illustrates what a point on an ROC curve repre- 
sents. The Cl used in this example is the RMS. An example 
ROC curve with the associated area under the curve ( AUC ) 
are shown in Figure 9b. Figure 10a illustrate dependence on 
resulting ROC on the sensor placement and Figure 10b shows 
how this ROC varies for different gears using the signal from 
a\. 

These plots provide an initial glimpse of the ability of dif- 


Fig. 10: RMS: (a) ROCs from four different accelerometers and a 
single gear (b) ROCs from several gears (using a\). 

ferent CIs to detect small cracks. The ROC is a two dimen- 
sional depiction of classifier performance, which makes these 
plots somewhat busy for multiple CIs and multiple accelerom- 
eters. A more compact comparison is enabled using the area 
under the ROC curve, abbreviated as AU Qj for gear i and ac- 
celerometer j. (Ref. 23) AUC is a single scalar obtained by 
integrating ROC. Higher AUC indicates better performance, 
with AU C - 1 being the maximum, associated with perfect 
detection. The summary comparison plot for early detection 
is shown in Figure 11a, which shows Maucj vs Cl, where 

Maucj = max (AU Qj ) . ( 1 ) 

j 

Each gear is shown with a unique marker. A histogram is also 
provided to indicate the number of overlapping markers. 

Within our sample, raw RMS exhibits the best average per- 
formance, smallest gear-to-gear variation, and does not show 
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Fig. 11: Summary performance of different features, (a) Early 
detection, (b) Entire propagation 


large outliers as other features. FMO, NA4, Kurtosis, and 
NP4 all have cases of false positives, which is manifested by 
Maucj outliers. It is also interesting to note that different CIs 
perform differently on different crack propagations. For ex- 
ample, the crack on GearlD =114 was missed by NP4, and 
Kurtosis, but not by FMO, NA4, and RMS. 


It is of interest to see how the performance improves over 
time. Figure lib shows the performance of the CIs when all 
crack propagation data is taken into account. While the per- 
formance of all indicators improved, RMS was still the only 
one without significant outliers. However, NA4 and AP4 show 
peaks in Maucj distributions at high probability of detection 
(POD), indicating better performance for a subset of gears. 
Also, as mentioned above, CIs seem complementary. Thus, 
the results confirm earlier suggestions that detectors employ- 
ing more than one Cl are promising. 


To demonstrate the benefit of Cl fusion, the complemen- 
tary nature of Cl information, by way of a panel of feature 
fusion experiments, was evaluated. In these experiments, the 
provided CIs were fused via three different predictive mod- 
els: LiNear Regression (LNR) (Ref. 24), LoGistic Regres- 
sion (LGR) (Ref. 25), and Boosted Regression Trees (BRT) 
(Ref. 26). LNR and LGR fit singular lower-dimensional mod- 
els to the data as follows: LNR finds a two-parameter lin- 
ear (slope and intercept) model that minimizes the sum of 
squared error between modeled values and observed values, 
while LGR finds a similar parametrization for a logit function 
in place of the linear function. The logit function, logit(v), is 
the inverse of the sigmoid function, defined as 


logit (x) = In 



( 2 ) 


BRT, in contrast, learn a fixed (constant) number of lower- 
dimensional models over discrete regions of the feature space, 
forming a single piecewise function. In the experiments 
performed, the BRT region boundaries and individual re- 
gion functions were linear functions, with parameter selec- 
tion based upon Adaptive Boosting and the number of regions 
fixed at 10. For all algorithms, the regression targets were 
based upon the observed wire breakages, with final classifi- 
cation results performed by thresholding the predicted output 
value for a given set of input Cl readings. In all cases, the three 
top-performing CIs were used as inputs, with the hypothe- 
sis that this arrangement should outperform single-CI crack 
detection. Features computed from one accelerometer were 
fused during these experiments, although additional benefit 
may be achieved by fusing features computed from vibration 
signals at different locations. As before, the largest of four 
AUC s was selected. 

Figure 12a shows the results after fusing three of the best 
performing CIs: RMS, NA4, NP4. These results show that 
performance trends with the best constituent CIs, but that 
CIs with strongly varying performance can reduce the over- 
all scores and introduce variance in the final result. 


Cl Comparison: Sensor Placement 


To compare CIs with respect to their sensitivity to ac- 
celerometer placement, plots of the standard deviation of 
AUC’s for individual cracks defined as 


GAUCi = 
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Fig. 12: MAUC,i after fusing top performers (raw 

RMS+NAA+NPA) 

where AU Q is their mean defined by 

1 4 

^AUCi = r-^AUCij (4) 

" 7=1 

are produced. 

Figure 13a illustrates the process for four select gears 
(GearlD = 108, 112, 114, and 116). The gears are indicated 
on the x-axis and AUQj values on the y-axis. The source ac- 
celerometers, denoted in the plot as a\ through < 24 , are consis- 
tently labeled with unique markers. In addition, their mean, 
Ma£/c,z> is also plotted and labeled with an ’x marker’. The 
values for the mean and standard deviation <JAUC,i are also in- 
dicated in the plot. 

If a Cl was perfectly insensitive to sensor placement, the 
standard deviation would be, ignoring minute differences due 
to sensor calibration and the processing channels, zero, be- 
cause CIs originating from differently placed accelerometers 
would be indistinguishable. Figure 13b shows the summary 
plot of standard deviations of AUCs of features computed 
from different data captured by differently placed accelerom- 
eters for the same propagations. The CIs exhibit similar de- 
pendence on sensor placement, as estimated by Gauc but 
raw kurtosis seems slightly better than the others. The depen- 
dence varies sample-to-sample, and the raw RMS displays the 
least gear-to-gear variation. Note that very low sensitivity to 
sensor placement often corresponds to a missed alarm. For 
example, in Figure 13b, GearlD =16 has the smallest Gauc 
for FM0, but Figure lib shows that this crack detection was 
missed by the Cl. 


Cl Comparison: Damage Assessment 

CIs are correlated with estimated crack size in order to 
compare them with respect to their ability to assess the dam- 


(a) 



Fig. 13: (a) AUQj vs. GearlD for four accelerometers (b) 
Sensitivity of CIs to sensor placement 

age. The first-order ground truth information on damage was 
contained in the crack propagation sensor signals, CP\ and 
CP 2 . Figure 14 is a schematic for the CPSs. There are two 
CP sensors, one on each gear face. Figure 15 is an example 



Fig. 14: Crack Propagation Sensor circuit schematic 


output. The resulting voltage is noisy and requires averaging 


and some signal processing, including filtering and peak de- 
tection. Breaking of CP wires give rise to the recorded voltage 
level as shown in Figure 16a. The spacing between centerlines 
is .25mm. The CPS estimates are saved in the same database 
table as the CIs, dy no -test-features. 


..AGearl 06\Run23\Dyno Gearl 06Run23_03734.dat 
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Fig. 15: CPS signal 

The CP1-CP2 plane is a parameter space spanned by the 
two voltages CP sensors. It maps directly into the the crack 
lengths as measured on the two gear faces. Figure 16b shows 
three different crack propagations (gearlD = 108, 114, and 
116) and illustrates that a crack can propagate fairly symmet- 
rically (Gear 108), but also asymmetrically (114 and 116). 
Note that the asymmetry for 114 and 116 is in different di- 
rections. 


> 

0 

o) 

ro 


Q_ 

O 



O. 

O 



Broken strands 

(a) 


Fig. 16: (a) theoretical CP levels (b) Three crack surface paths 
in CP 1 -CP 2 plane. 

The following plots show each Cl as a function of crack 
length. All CIs are computed from the data from the most 
sensitive accelerometer. Figures 17a-b plots raw RMS vs. 
CP 1 /CP 2 and Figures 17c-d plots FM0 vs. CP\ ICP 2 . In a like 
manner, Figures 1 8a-b plots the results for NA4 and Figures 
18c-d for the Kurtosis. The NP4 results are given in Figure 
19. All CIs showed relatively weak dependence on the level 
of damage, as estimated by surface sensors CP\ and CP 2 . 
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Fig. 17: Dependence of CIs on the damage level, measured by 
CP\ and CP 2 for four representative propagations. RMS and 
FM0 as a function of CP\ / CP 2 o:108 , □: 1 14, >: 1 16 

Conclusions 

The research project consisted of accelerated crack growth 
in spur gears and measuring vibrations using accelerometers 
mounted on the gearbox. The cracks were initiated using a fa- 
tigue tester. Crack propagation sensors, mounted on each face 
of the cracked tooth, captured the ground truth information on 
crack propagation. Signals from four accelerometers and the 
tachometer were used to compute condition indicators. Five 
CIs were selected from a larger set and compared with respect 
to their ability to detect small cracks, their sensitivity to sen- 
sor placement, and their ability to assess the damage. While 
simple raw RMS was found to be the most robust for early 
crack detection, more advanced CIs exhibited higher AUCs 
for a subset of gears. In addition, evidence of the complemen- 
tary detection power of different features encouraged feature 
fusion for improved performance. It was found that combin- 
ing all features does not always improve the overall perfor- 
mance. The compared features showed similar sensitivity to 
sensor placement. The initial analysis did not detect a con- 
sistent significant sensitivity to damage of any of the selected 
CIs. 
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